Loading all libraries to be used in the time-series
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## Warning: package 'readr' was built under R version 4.0.2
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyquant)
## Warning: package 'tidyquant' was built under R version 4.0.2
## Loading required package: lubridate
## Warning: package 'lubridate' was built under R version 4.0.2
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Loading required package: PerformanceAnalytics
## Loading required package: xts
## Warning: package 'xts' was built under R version 4.0.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.0.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
## Loading required package: quantmod
## Warning: package 'quantmod' was built under R version 4.0.2
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 4.0.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Version 0.4-0 included new data defaults. See ?getSymbols.
## ══ Need to Learn tidyquant? ═════════════════════════════════════════════════════════════════════════════════════════
## Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
## </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>
library(modelr)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.0.2
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(grid)
library(ggplot2)
library(lubridate)
library(xts)
library(ggplot2)
library(dplyr)
library(plotly)
## Warning: package 'plotly' was built under R version 4.0.2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(dygraphs)
## Warning: package 'dygraphs' was built under R version 4.0.2
library(htmlwidgets)
## Warning: package 'htmlwidgets' was built under R version 4.0.2
Loading text time series data
global = read.delim('/users/rhome/globaltemp.txt', stringsAsFactors = FALSE, header = T, sep = ",")
display the columns names
names(global)
## [1] "dt"
## [2] "LandAverageTemperature"
## [3] "LandAverageTemperatureUncertainty"
## [4] "LandMaxTemperature"
## [5] "LandMaxTemperatureUncertainty"
## [6] "LandMinTemperature"
## [7] "LandMinTemperatureUncertainty"
## [8] "LandAndOceanAverageTemperature"
## [9] "LandAndOceanAverageTemperatureUncertainty"
Check the class of the date column
class(global$dt)
## [1] "character"
convert column to date class
dateOnly = as.Date(global$dt)
class(dateOnly)
## [1] "Date"
Let’s check first five values of the head
head(dateOnly)
## [1] "1750-01-01" "1750-02-01" "1750-03-01" "1750-04-01" "1750-05-01"
## [6] "1750-06-01"
Let’s check last five values of the data
tail(dateOnly)
## [1] "2015-07-01" "2015-08-01" "2015-09-01" "2015-10-01" "2015-11-01"
## [6] "2015-12-01"
Just to overview the summary of each column
summary(global)
## dt LandAverageTemperature LandAverageTemperatureUncertainty
## Length:3192 Min. :-2.080 Min. :0.0340
## Class :character 1st Qu.: 4.312 1st Qu.:0.1867
## Mode :character Median : 8.611 Median :0.3920
## Mean : 8.375 Mean :0.9385
## 3rd Qu.:12.548 3rd Qu.:1.4192
## Max. :19.021 Max. :7.8800
## NA's :12 NA's :12
## LandMaxTemperature LandMaxTemperatureUncertainty LandMinTemperature
## Min. : 5.90 Min. :0.0440 Min. :-5.407
## 1st Qu.:10.21 1st Qu.:0.1420 1st Qu.:-1.335
## Median :14.76 Median :0.2520 Median : 2.950
## Mean :14.35 Mean :0.4798 Mean : 2.744
## 3rd Qu.:18.45 3rd Qu.:0.5390 3rd Qu.: 6.779
## Max. :21.32 Max. :4.3730 Max. : 9.715
## NA's :1200 NA's :1200 NA's :1200
## LandMinTemperatureUncertainty LandAndOceanAverageTemperature
## Min. :0.0450 Min. :12.47
## 1st Qu.:0.1550 1st Qu.:14.05
## Median :0.2790 Median :15.25
## Mean :0.4318 Mean :15.21
## 3rd Qu.:0.4582 3rd Qu.:16.40
## Max. :3.4980 Max. :17.61
## NA's :1200 NA's :1200
## LandAndOceanAverageTemperatureUncertainty
## Min. :0.0420
## 1st Qu.:0.0630
## Median :0.1220
## Mean :0.1285
## 3rd Qu.:0.1510
## Max. :0.4570
## NA's :1200
Summary shows there are NA’s in the data. We are examining Land Average temperature, Land and ocean temperature and Land maximum temperature. Let’s discuss Land Average temperature; 12 NA’s are present minimum average temperature of the Land is -2.080 Maximum average temperature of the land is 19.021 Median is 8.611 Mean is 8.375, When mean and median values are close to each other including NA’s than it means the distribution is at large scale to ignore skewness because of NA’s to the data.Ist quadrant, median and third quadrant values are uniformly distributed by the factor 4. It means data in this column is at normal distribution. Below is distribution of the data in scatter plot.
plot(global$LandAverageTemperature)
par(mfrow=c(1,1),bg="lavender")
barplot(global$LandAverageTemperature,col="lightpink",ylim=c(0,20),ylab="temperature",xlab="Land average temperature distribution")
Above graph shows that data is uniformly distributed. We can limit the values to see pink color of the graph.
hist(global$LandAverageTemperature, col = "pink")
Below is interactive graphical representation of the Land average temperature
p1 <- global %>%
ggplot( aes(x=dateOnly, y=(LandAverageTemperature))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("landAverageTemperature") +
theme_ipsum()
p1 = ggplotly(p1)
## Warning: Removed 12 rows containing missing values (position_stack).
p1
Interactive graph shows that on 1761/07/01 was the recorded highest average temperature of 19.021. and minimum of the time period is on 1768-01-01 with recording of -2.080 temperature. Overall there is an increase in the land average temperature as it starts at 15.868 on 1750-07-01 , 14.492 on 1850-07-01, 14.140 on 1950-07-01, 15.051 on 2015-07-01. Average temperature remains in range of 14 to 15. I analyzed increase because minimum temperature never fall below negative after 1838 with -0.057 temperature, it means more summer and less winter time period and overall trend upwards.
Below is the prediction of the Land average temperature
global.timeseries = ts(global$LandAverageTemperature,start = c(1850-12-01, 2015-12-01),frequency = 3)
plot(global.timeseries)
Prediction shows that there is more increase in land average temperature trend, definitely not good for living creatures on earth. According to paris climate agreement if temperature rises above preindustrial level 1.5 degree celuis to 2, temperature of earth is most likely going to make life difficult for all the creatures, not only humans.
Let’s discuss about Land Maximum temperature Minimum temperature among all maximum temperatures recorded over the years is 5.90. and highest temperature among all the recorded highest temperatures is 21.32. 1200 NA’s , Median and mean are almost 14 that means distribution of the data is uniform. Let’s visualize;
plot(global$LandMaxTemperature)
par(mfrow=c(1,1),bg="cornsilk")
barplot(global$LandMaxTemperature,col="lightpink",ylim=c(0,20),ylab="temperature",xlab="Land max temperature distribution")
hist(global$LandMaxTemperature, col = "pink")
In visualization the data is widely distributed over five different uniform curves, due to high density of the data, let’s ignore NA’s. But we can see that data is skewed.
p <- global %>%
ggplot( aes(x=dateOnly, y=(LandMaxTemperature))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("landMaxtemperature") +
theme_ipsum()
p = ggplotly(p)
## Warning: Removed 1200 rows containing missing values (position_stack).
p
In interactive graph, highest peaks are on 1854-07-01, 20.426 and 1877-07-01 surprisingly same month and date, 20.733 and 1915-07-01 was 20.553 then end of 20 century, it started rising with frequent highest peaks on 1998-07-01 was 20.972, then 2002-07-01 was 21.199, 2011-07-01 was 21.320. Trend of max temperature is increasing too fast in 21st century. Most likely that is the reason of global warming and it’s consequences.
Just my own hypothesis, that lava inside core of the earth temperature is balanced by different layers temperature. Different earth layers temperature is balanced by ice and water balance on earth. Now glaciers are melting, most likely earth itself will show increase in temperature and plus Carbon dioxide traps and sun heat, most likely seems more dangerous situation than predicted. Let’s see the prediction;
global.timeseries3 = ts(global$LandMaxTemperature,start = c(1850-12-01, 2015-12-01),frequency = 3.2)
plot(global.timeseries3)
In prediction, there is an increasing trend over the next few centuries.
Let’s discuss the land and ocean temperature, LandAndOceanAverageTemperature There are 1200 NA’s in the data. Minimum land and ocean temperature - 12.47, Values of first quadrant, third and median almost distributed in range 14.05 to 16.40 and 15 median makes it normal distribution. Maximum land and ocean temperature is 17.61, quite noticeable. Let’s visualize the data;
plot(global$LandAndOceanAverageTemperature)
par(mfrow=c(1,1),bg="lavender")
barplot(global$LandAndOceanAverageTemperature,col="lightpink",ylim=c(0,20),ylab="temperature",xlab="Land and ocean average temperature distribution")
hist(global$LandAndOceanAverageTemperature, col = "pink")
Histogram shows two normal distribution curves.
p3 <- global %>%
ggplot( aes(x=dateOnly, y=(LandAndOceanAverageTemperature))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("landMaxtemperature") +
theme_ipsum()
p3 = ggplotly(p3)
## Warning: Removed 1200 rows containing missing values (position_stack).
p3
In interaction graph, in 1866-07-01 temp 17.060, 1941-07-01 temp 17.131, 1945-08-01 temp 17.106, 1951-08-01 temp 17.081, 1977-07-01 temp 17.047, 1983-08-01 temp 17.145, 1987-07-01 temp 17.296, 1995-07-01 temp 17.375, 1998-07-01 temp 17.609, 2001-07-01 temp 17.450, 2009-07-01 temp 17.578, 2015-07-01 temp 17.611.
It is increasing trend in land and ocean temperature.
global.timeseries2 = ts(global$LandAndOceanAverageTemperature,start = c(1850-12-01, 2015-12-01),frequency = 3)
plot(global.timeseries2)
In prediction temperature, going really high as compared to now. Not much clear graph, but it is overall prediction that is going up.
#new graph just for analysing the data more on land average temperature; taking only every 100th row to visualize;
new_maxLandAvrtemp = global.timeseries[seq(1, length(global.timeseries), 100)]
new_date = dateOnly[seq(1, length(dateOnly), 100)]
z=merge(new_date,new_maxLandAvrtemp, all = TRUE, by.new_date = "date", by.new_maxLandAvrtemp = "newmaxLandAvrtemp")
z
head(z)
tail(z)
global.timeseries4 = ts(z$x,start = c(1850-12-01, 2015-12-01),frequency = 20)
plot(global.timeseries4)
library(tidyverse)
library(lubridate)
library(fpp2)
## Warning: package 'fpp2' was built under R version 4.0.2
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 4.0.2
## Loading required package: fma
## Warning: package 'fma' was built under R version 4.0.2
## Loading required package: expsmooth
## Warning: package 'expsmooth' was built under R version 4.0.2
library(zoo)
movngAvrg = forecast::ma(global$LandAndOceanAverageTemperature, 120, centre = TRUE)
plot(movngAvrg)
Above graph is created moving average of 10 years data, it shows increasing trend in land and ocean average temperature over the years.
p <- z %>%
ggplot( aes(x=x, y=(y))) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("new_landMaxtemperature") +
theme_ipsum()
p = ggplotly(p)
p
Above graph is more clear and it seems temperature over the years changes very slow and peaks are almost equal. Most likely this is the reason that the climate change is unnoticeable but as we have seen the trend it is happening.